Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach.

نویسندگان

  • Ashwin N Ananthakrishnan
  • Tianxi Cai
  • Guergana Savova
  • Su-Chun Cheng
  • Pei Chen
  • Raul Guzman Perez
  • Vivian S Gainer
  • Shawn N Murphy
  • Peter Szolovits
  • Zongqi Xia
  • Stanley Shaw
  • Susanne Churchill
  • Elizabeth W Karlson
  • Isaac Kohane
  • Robert M Plenge
  • Katherine P Liao
چکیده

BACKGROUND Previous studies identifying patients with inflammatory bowel disease using administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record-based model for classification of inflammatory bowel disease leveraging the combination of codified data and information from clinical text notes using natural language processing. METHODS Using the electronic medical records of 2 large academic centers, we created data marts for Crohn's disease (CD) and ulcerative colitis (UC) comprising patients with ≥1 International Classification of Diseases, 9th edition, code for each disease. We used codified (i.e., International Classification of Diseases, 9th edition codes, electronic prescriptions) and narrative data from clinical notes to develop our classification model. Model development and validation was performed in a training set of 600 randomly selected patients for each disease with medical record review as the gold standard. Logistic regression with the adaptive LASSO penalty was used to select informative variables. RESULTS We confirmed 399 CD cases (67%) in the CD training set and 378 UC cases (63%) in the UC training set. For both, a combined model including narrative and codified data had better accuracy (area under the curve for CD 0.95; UC 0.94) than models using only disease International Classification of Diseases, 9th edition codes (area under the curve 0.89 for CD; 0.86 for UC). Addition of natural language processing narrative terms to our final model resulted in classification of 6% to 12% more subjects with the same accuracy. CONCLUSIONS Inclusion of narrative concepts identified using natural language processing improves the accuracy of electronic medical records case definition for CD and UC while simultaneously identifying more subjects compared with models using codified data alone.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Study of Upper Gastrointestinal Endoscopy in Patients with Inflammatory Bowel Disease and Ulcerative Colitis

Background and aims: In diagnosing inflammatory bowel disease, one of diagnostic way is upper gastrointestinal endoscopy, which helps in differential diagnosis of unspecified colitis as well. The aim of this study was to investigate the necessity of upper gastrointestinal endoscopy in patients with inflammatory bowel disease.   Materials and Methods: In this descriptive cross-sectional...

متن کامل

Improving Case Definition of Crohns Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing

Introduction—Prior studies identifying patients with inflammatory bowel disease (IBD) utilizing administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record (EMR) based model for classification of IBD leveraging the combination of codified data and information from clinical text notes using natural language processing (NLP). Methods—Usi...

متن کامل

Pyostomatitis Vegetant: An Important Diagnostic in Oral Diseases (Two Case Reports)

Background and Objectives: Inflammatory bowel disease (IBD) is a term that refers to crohn's disease (CD) and ulcerative colitis (UC). Oral manifestations in this disease category precedes the onset of gastrointestinal symptoms. In many patients, intestinal symptoms may be minimal or remain undiagnosed. In this paper, two cases of Pyostomatitis vegetans have been investigated.   Case Report: T...

متن کامل

Serum Interleukin-23 Levels in Patients with Ulcerative Colitis

Background: Patients with ulcerative colitis are at increased risk of inflammation. Interleukin 23 (IL-23) is a newly identified cytokine with increased expression in inflamed biopsies of colon mucosa in patients with Crohn's disease; however, there is inconsistent evidence on its role in ulcerative colitis. Objective: We aimed to compare serum IL-23 level in patients with ulcerative colitis an...

متن کامل

Utility And Metrics Of Natural Language Processing On Identifying Patients For Pharmacoepidemiologic Studies.

Objective Electronic medical records (EMR) are increasingly utilized in clinical practice and research, allowing for more efficient availability of rich patient records. However, most use of EMR is limited to coded, structured, administrative data, while the vast majority of patient information (e.g. disease subtype, severity, medical device usage, etc.) is tied up in narrative clinical notes. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inflammatory bowel diseases

دوره 19 7  شماره 

صفحات  -

تاریخ انتشار 2013